Evaluating Long-term Spectral Subtraction for Reverberant Asr
نویسندگان
چکیده
Even a modest degree of room reverberation can greatly increase the difficulty of Automatic Speech Recognition. We have observed large increases in speech recognition word error rates when using a far-field (3-6 feet) mic in a conference room, in comparison with recordings from headmounted mics. In this paper, we describe experiments with a proposed remedy based on the subtraction of an estimate of the log spectrum from a long-term (e.g., 2 s) analysis window, followed by overlap-add resynthesis. Since the technique is essentially one of enhancement, the processed signal it generates can be used as input for complete speech recognition systems. Here we report results with both HTK and the SRI Hub-5 recognizer. For simpler recognizer configurations and/or moderate-sized training, the improvements are huge, while moderate improvements are still observed for more complex configurations under a number of conditions.
منابع مشابه
Microsoft Word - gillespie-03-icassp
__________________________________________ * Currently affiliated with Microsoft Corporation. ABSTRACT We showed in [1] that penalizing long-term reverberation energy is more effective than maximizing the signal-to-reverberation ratio (SRR) for improving audible quality and automatic speech recognition (ASR) accuracy. Using this knowledge we propose a blind approach to speech dereverberation th...
متن کاملSpeech Recognition by Denoising and Dereverberation Based on Spectral Subtraction in a Real Noisy Reverberant Environment
A blind dereverberation method based on spectral subtraction using a multi-channel least mean squares algorithm was previously proposed. The results of a large vocabulary continuous speech recognition task showed that this method achieved significant improvements over the conventional method based on cepstral mean normalization and beamforming in a simulated reverberant environment without addi...
متن کاملRobust Speech Recognition Using Speech Enhancement
Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly...
متن کاملDeep neural network based spectral feature mapping for robust speech recognition
Automatic speech recognition (ASR) systems suffer from performance degradation under noisy and reverberant conditions. In this work, we explore a deep neural network (DNN) based approach for spectral feature mapping from corrupted speech to clean speech. The DNN based mapping substantially reduces interference and produces estimated clean spectral features for ASR training and decoding. We expe...
متن کاملIMPROVING ASR PERFORMANCE FOR REVERBERANT SPEECH Brian
The performance of current automatic speech recognition (ASR) systems is very sensitive to the presence of room reverberation in the incoming speech signal. We investigate a family of front-end speech representations that focus on slow changes in the the gross spectral structure of speech for their ability to improve the robustness of ASR systems to reverberation. A number of the front ends pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001